The development of deep learning based image representation learning (IRL) methods has attracted great attention in the context of remote sensing (RS) image understanding. Most of these methods require the availability of a high quantity and quality of annotated training images, which can be time-consuming and costly to gather. To reduce labeling costs, publicly available thematic maps, automatic labeling procedures or crowdsourced data can be used. However, such approaches increase the risk of including label noise in training data. It may result in overfitting on noisy labels when discriminative reasoning is employed as in most of the existing methods. This leads to sub-optimal learning procedures, and thus inaccurate characterization of RS images. In this paper, as a first time in RS, we introduce a generative reasoning integrated label noise robust representation learning (GRID) approach. GRID aims to model the complementary characteristics of discriminative and generative reasoning for IRL under noisy labels. To this end, we first integrate generative reasoning into discriminative reasoning through a variational autoencoder. This allows our approach to automatically detect training samples with noisy labels. Then, through our label noise robust hybrid representation learning strategy, GRID adjusts the whole learning procedure for IRL of these samples through generative reasoning and that of the other samples through discriminative reasoning. Our approach learns discriminative image representations while preventing interference of noisy labels during training independently from the IRL method. Thus, unlike the existing methods, GRID does not depend on the type of annotation, label noise, neural network, loss or learning task, and thus can be utilized for various RS image understanding problems. Experimental results show the effectiveness of GRID compared to state-of-the-art methods.
translated by 谷歌翻译
The use of deep neural networks (DNNs) has recently attracted great attention in the framework of the multi-label classification (MLC) of remote sensing (RS) images. To optimize the large number of parameters of DNNs a high number of reliable training images annotated with multi-labels is often required. However, the collection of a large training set is time-consuming, complex and costly. To minimize annotation efforts for data-demanding DNNs, in this paper we present several query functions for active learning (AL) in the context of DNNs for the MLC of RS images. Unlike the AL query functions defined for single-label classification or semantic segmentation problems, each query function presented in this paper is based on the evaluation of two criteria: i) multi-label uncertainty; and ii) multi-label diversity. The multi-label uncertainty criterion is associated to the confidence of the DNNs in correctly assigning multi-labels to each image. To assess the multi-label uncertainty, we present and adapt to the MLC problems three strategies: i) learning multi-label loss ordering; ii) measuring temporal discrepancy of multi-label prediction; and iii) measuring magnitude of approximated gradient embedding. The multi-label diversity criterion aims at selecting a set of uncertain images that are as diverse as possible to reduce the redundancy among them. To assess this criterion we exploit a clustering based strategy. We combine each of the above-mentioned uncertainty strategy with the clustering based diversity strategy, resulting in three different query functions. Experimental results obtained on two benchmark archives show that our query functions result in the selection of a highly informative set of samples at each iteration of the AL process in the context of MLC.
translated by 谷歌翻译
全球地球观察(EO)的运营能力不断增长为数据驱动的方法创造了新的机会,以理解和保护我们的星球。但是,由于巨大的档案尺寸和EO平台提供的有限的勘探功能,目前使用EO档案的使用受到了极大的限制。为了解决这一限制,我们最近提出了米兰,这是一种基于内容的图像检索方法,用于在卫星图像档案中快速相似性搜索。米兰是基于公制学习的深层哈希网络,将高维图像特征编码为紧凑的二进制哈希码。我们将这些代码用作哈希表中的钥匙,以实现实时邻居搜索和高度准确的检索。在此演示中,我们通过将米兰与Agoraeo内的浏览器和搜索引擎集成在一起来展示米兰的效率。地震支持卫星图像存储库上的交互式视觉探索和典型查询。演示访问者将与地震互动,扮演不同用户的角色,这些用户的角色通过其语义内容搜索图像,并通过其语义内容搜索并应用其他过滤器。
translated by 谷歌翻译
遥感(RS)图像的多标签分类(MLC)精确方法的开发是RS中最重要的研究主题之一。为了解决MLC问题,发现需要大量可靠的可靠训练图像,该图像由多个土地覆盖级标签(多标签)注释,这些培训图像在Rs中很受欢迎。但是,收集这种注释是耗时且昂贵的。以零标签成本获得注释的常见程序是依靠主题产品或众包标签。作为缺点,这些过程具有标签噪声的风险,可能会扭曲MLC算法的学习过程。在文献中,大多数标签噪声鲁棒方法都是针对计算机视觉(CV)中单标签分类(SLC)问题设计的,其中每个图像都由单个标签注释。与SLC不同,MLC中的标签噪声可以与:1)减去标签 - 噪声(在图像中存在该类时,未分配土地覆盖类标签为图像); 2)添加标签噪声(尽管该类不存在在给定图像中,但将土地覆盖类标签分配给图像); 3)混合标签 - 噪声(两者的组合)。在本文中,我们研究了三种不同的噪声鲁棒CV SLC方法,并将其适应为RS的多标签噪声场景。在实验过程中,我们研究了不同类型的多标签噪声的影响,并严格评估了适用的方法。为此,我们还引入了一种合成的多标签噪声注入策略,该策略与统一标签噪声注入策略相比,该策略更适合模拟操作场景,在该策略中,缺少和当前类的标签以均匀的概率上翻转。此外,我们研究了噪声多标签下不同评估指标在MLC问题中的相关性。
translated by 谷歌翻译
由于多模式遥感(RS)图像档案的可用性,最重要的研究主题之一是开发跨模式RS图像检索(CM-RSIR)方法,该方法可以在不同模态上搜索语义上相似的图像。现有的CM-RSIR方法需要提供高质量和数量的带注释的培训图像。在操作方案中,收集足够数量的可靠标记图像是耗时,复杂且昂贵的,并且可能会显着影响CM-RSIR的最终准确性。在本文中,我们介绍了一种新颖的自我监督的CM-RSIR方法,其目的是:i)以自我监督的方式模拟不同方式之间的相互信息; ii)保留彼此相似的模态特异性特征空间的分布; iii)在每种模式中定义最相似的图像,而无需任何带注释的训练图像。为此,我们提出了一个新的目标,其中包括同时同时使用的三个损失函数:i)最大化不同模态的共同信息以保存模式间相似性; ii)最小化多模式图像元素的角度距离,以消除模式间差异; iii)增加每种模式中最相似图像的余弦相似性,以表征模式内相似性。实验结果表明,与最新方法相比,该方法的有效性。该方法的代码可在https://git.tu-berlin.de/rsim/ss-cm-rsir上公开获得。
translated by 谷歌翻译
本文介绍了一种基于深度度量学习的新型半监督回归(DML-S2R)方法,以解决参数估计问题。提出的DML-S2R方法旨在减轻标记样品不足的问题,而无需收集任何具有目标值的其他样本。为此,它由两个主要步骤组成:i)具有稀缺标记的数据的成对相似性建模; ii)基于三胞胎的度量学习,并具有丰富的未标记数据。第一步旨在通过使用少量标记的样品对成对样品相似性进行建模。这是通过估计具有暹罗神经网络(SNN)标记样品的目标值差异来实现的。第二步旨在学习一个基于三重态的度量空间(其中相似的样品彼此接近,并且相差样本彼此相距甚远),当时标记的样品数量不足。这是通过采用第一步的SNN来实现的,用于基于三重态的深度度量学习,不仅利用了标记的样品,而且还可以利用未标记的样本。对于DML-S2R的端到端培训,我们研究了这两个步骤的替代学习策略。由于这种策略,每个步骤中的编码信息成为另一个步骤学习阶段的指导。实验结果证实了DML-S2R与最先进的半监督回归方法相比的成功。该方法的代码可在https://git.tu-berlin.de/rsim/dml-s2r上公开获得。
translated by 谷歌翻译
学习遥感(RS)图像之间的相似性形成基于内容的RS图像检索(CBIR)的基础。最近,将图像的语义相似性映射到嵌入(度量标准)空间的深度度量学习方法已经发现非常流行。学习公制空间的常见方法依赖于将与作为锚称为锚的参考图像的类似(正)和不同(负)图像的三胞胎的选择。选择三胞胎是一个难以为多标签RS CBIR的困难任务,其中每个训练图像由多个类标签注释。为了解决这个问题,在本文中,我们提出了一种在为多标签RS CBIR问题定义的深神经网络(DNN)的框架中提出了一种新颖的三联样品采样方法。该方法基于两个主要步骤选择一小部分最多代表性和信息性三元组。在第一步中,使用迭代算法从当前迷你批量选择在嵌入空间中彼此多样化的一组锚。在第二步中,通过基于新颖的策略评估彼此之间的图像的相关性,硬度和多样性来选择不同的正面和负图像。在两个多标签基准档案上获得的实验结果表明,在DNN的上下文中选择最具信息丰富和代表性的三胞胎,导致:i)降低DNN训练阶段的计算复杂性,而性能没有任何显着损失; ii)由于信息性三元组允许快速收敛,因此学习速度的增加。所提出的方法的代码在https://git.tu-berlin.de/rsim/image-reetrieval-from-tropls上公开使用。
translated by 谷歌翻译
遥感(RS)图像的多标签分类(MLC)的准确方法的开发是RS中最重要的研究主题之一。基于深度卷积神经网络(CNNS)的方法显示了RS MLC问题的强劲性能。然而,基于CNN的方法通常需要多个陆地覆盖类标签注释的大量可靠的训练图像。收集这些数据是耗时和昂贵的。为了解决这个问题,可包括嘈杂标签的公开专题产品可用于向RS零标记成本注释RS图像。但是,多标签噪声(可能与错误且缺少标签注释相关)可以扭曲MLC算法的学习过程。标签噪声的检测和校正是具有挑战性的任务,尤其是在多标签场景中,其中每个图像可以与多于一个标签相关联。为了解决这个问题,我们提出了一种新的噪声稳健协作多标签学习(RCML)方法,以减轻CNN模型训练期间多标签噪声的不利影响。 RCML在基于三个主模块的RS图像中识别,排名和排除噪声多标签:1)差异模块; 2)组套索模块; 3)交换模块。差异模块确保两个网络了解不同的功能,同时产生相同的预测。组套索模块的任务是检测分配给多标记训练图像的潜在嘈杂的标签,而交换模块任务致力于在两个网络之间交换排名信息。与现有的方法不同,我们提出了关于噪声分布的假设,我们所提出的RCML不会在训练集中的噪声类型之前进行任何先前的假设。我们的代码在线公开提供:http://www.noisy-labels-in-rs.org
translated by 谷歌翻译
With the proliferation of deep generative models, deepfakes are improving in quality and quantity everyday. However, there are subtle authenticity signals in pristine videos, not replicated by SOTA GANs. We contrast the movement in deepfakes and authentic videos by motion magnification towards building a generalized deepfake source detector. The sub-muscular motion in faces has different interpretations per different generative models which is reflected in their generative residue. Our approach exploits the difference between real motion and the amplified GAN fingerprints, by combining deep and traditional motion magnification, to detect whether a video is fake and its source generator if so. Evaluating our approach on two multi-source datasets, we obtain 97.17% and 94.03% for video source detection. We compare against the prior deepfake source detector and other complex architectures. We also analyze the importance of magnification amount, phase extraction window, backbone network architecture, sample counts, and sample lengths. Finally, we report our results for different skin tones to assess the bias.
translated by 谷歌翻译
Though impressive success has been witnessed in computer vision, deep learning still suffers from the domain shift challenge when the target domain for testing and the source domain for training do not share an identical distribution. To address this, domain generalization approaches intend to extract domain invariant features that can lead to a more robust model. Hence, increasing the source domain diversity is a key component of domain generalization. Style augmentation takes advantage of instance-specific feature statistics containing informative style characteristics to synthetic novel domains. However, all previous works ignored the correlation between different feature channels or only limited the style augmentation through linear interpolation. In this work, we propose a novel augmentation method, called \textit{Correlated Style Uncertainty (CSU)}, to go beyond the linear interpolation of style statistic space while preserving the essential correlation information. We validate our method's effectiveness by extensive experiments on multiple cross-domain classification tasks, including widely used PACS, Office-Home, Camelyon17 datasets and the Duke-Market1501 instance retrieval task and obtained significant margin improvements over the state-of-the-art methods. The source code is available for public use.
translated by 谷歌翻译